15 research outputs found

    Robust Subspace Estimation Using Low-rank Optimization. Theory And Applications In Scene Reconstruction, Video Denoising, And Activity Recognition.

    Get PDF
    In this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time. Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories . . . etc). If the assumption that these observations are drawn from a liner subspace (or can be linearly approximated) is valid, then the goal is to represent each observation as a linear combination of a compact basis, while maintaining a minimal reconstruction error. One of the earliest, yet most popular, approaches to achieve that is Principal Component Analysis (PCA). However, PCA can only handle Gaussian noise, and thus suffers when the observations are contaminated with gross and sparse outliers. To this end, in this dissertation, we focus on estimating the subspace robustly using low-rank optimization, where the sparse outliers are detected and separated through the `1 norm. The robust estimation has a two-fold advantage: First, the obtained basis better represents the actual subspace because it does not include contributions from the outliers. Second, the detected outliers are often of a specific interest in many applications, as we will show throughout this thesis. We demonstrate four different formulations and applications for low-rank optimization. First, we consider the problem of reconstructing an underwater sequence by removing the iii turbulence caused by the water waves. The main drawback of most previous attempts to tackle this problem is that they heavily depend on modelling the waves, which in fact is ill-posed since the actual behavior of the waves along with the imaging process are complicated and include several noise components; therefore, their results are not satisfactory. In contrast, we propose a novel approach which outperforms the state-of-the-art. The intuition behind our method is that in a sequence where the water is static, the frames would be linearly correlated. Therefore, in the presence of water waves, we may consider the frames as noisy observations drawn from a the subspace of linearly correlated frames. However, the noise introduced by the water waves is not sparse, and thus cannot directly be detected using low-rank optimization. Therefore, we propose a data-driven two-stage approach, where the first stage “sparsifies” the noise, and the second stage detects it. The first stage leverages the temporal mean of the sequence to overcome the structured turbulence of the waves through an iterative registration algorithm. The result of the first stage is a high quality mean and a better structured sequence; however, the sequence still contains unstructured sparse noise. Thus, we employ a second stage at which we extract the sparse errors from the sequence through rank minimization. Our method converges faster, and drastically outperforms state of the art on all testing sequences. Secondly, we consider a closely related situation where an independently moving object is also present in the turbulent video. More precisely, we consider video sequences acquired in a desert battlefields, where atmospheric turbulence is typically present, in addition to independently moving targets. Typical approaches for turbulence mitigation follow averaging or de-warping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects which can often be of great interest. Therefore, we address the iv problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and `1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise, and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by `1 norm. Second, since the object’s motion is linear and intrinsically different than the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. In addition to robustly detecting the subspace of the frames of a sequence, we consider using trajectories as observations in the low-rank optimization framework. In particular, in videos acquired by moving cameras, we track all the pixels in the video and use that to estimate the camera motion subspace. This is particularly useful in activity recognition, which typically requires standard preprocessing steps such as motion compensation, moving object detection, and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. In contrast, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned diffi- culties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions v of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features, which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets, and obtained promising results. Finally, we consider a more challenging problem referred to as complex event recognition, where the activities of interest are complex and unconstrained. This problem typically pose significant challenges because it involves videos of highly variable content, noise, length, frame size . . . etc. In this extremely challenging task, high-level features have recently shown a promising direction as in [53, 129], where core low-level events referred to as concepts are annotated and modelled using a portion of the training data, then each event is described using its content of these concepts. However, because of the complex nature of the videos, both the concept models and the corresponding high-level features are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich high-level features. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the high-level features by a significant margin

    HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences

    No full text
    We present a new descriptor for activity recognition from videos acquired by a depth sensor. Previous descriptors mostly compute shape and motion features independently; thus, they often fail to capture the complex joint shapemotion cues at pixel-level. In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates. To build the histogram, we create 4D projectors, which quantize the 4D space and represent the possible directions for the 4D normal. We initialize the projectors using the vertices of a regular polychoron. Consequently, we refine the projectors using a discriminative density measure, such that additional projectors are induced in the directions where the 4D normals are more dense and discriminative. Through extensive experiments, we demonstrate that our descriptor better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks. 1

    Horizon constraint for unambiguous UAV navigation in planar scenes

    No full text
    When the UAV goes to high altitudes such that the observed surface of the earth becomes planar, the structure and motion recovery of the earth\u27s moving plane becomes ambiguous. This planar degeneracy has been pointed out very often in the literature; therefore, current navigation methods either completely fail or give many confusing solutions in such scenario. Interestingly, the horizon line in planar scenes is straight and distinctive; hence, easily detected. Therefore, we show in this paper that the horizon line provides two degrees of freedom that control the relative orientation between the camera coordinate system and the local surface of earth. The recovered degrees of freedom help linearize and disambiguate the planar flow, and therefore we obtain a unique solution for the UAV motion estimation. Unlike previous work which used the horizon to provide the roll angle and the pitch percentage and only employed them for flight stability, we extract the exact angles and directly use them to estimate the ego motion. Additionally, we propose a novel horizon detector based on the maximum a posteriori estimation of both motion and appearance features which outperforms the other detectors in planar scenarios. We thoroughly experimented on the proposed method against information from GPS and gyroscopes, and obtained promising results. © 2011 IEEE

    Simultaneous Video Stabilization And Moving Object Detection In Turbulence

    No full text
    Turbulence mitigation refers to the stabilization of videos with nonuniform deformations due to the influence of optical turbulence. Typical approaches for turbulence mitigation follow averaging or dewarping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects, which can often be of great interest. In this paper, we address the novel problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and ℓ1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by ℓ1 norm. Second, since the object\u27s motion is linear and intrinsically different from the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. © 2012 IEEE

    Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories

    No full text
    Recognition of human actions in a video acquired by a moving camera typically requires standard preprocessing steps such as motion compensation, moving object detection and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. Therefore, action recognition from a moving camera is considered very challenging. In this paper, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned difficulties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we propose a novel approach based on low rank optimization, where we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets and two new aerial datasets called ARG and APHill, and obtained promising results. © 2011 IEEE

    Human identity recognition in aerial images

    No full text
    Human identity recognition is an important yet underaddressed problem. Previous methods were strictly limited to high quality photographs, where the principal techniques heavily rely on body details such as face detection. In this paper, we propose an algorithm to address the novel problem of human identity recognition over a set of unordered low quality aerial images. Assuming a user was able to manually locate a target in some images of the set, we find the target in each other query image by implementing a weighted voter-candidate formulation. In the framework, every manually located target is a voter, and the set of humans in a query image are candidates. In order to locate the target, we detect and align blobs of voters and candidates. Consequently, we use PageRank to extract distinguishing regions, and then match multiple regions of a voter to multiple regions of a candidate using Earth Mover Distance (EMD). This generates a robust similarity measure between every voter-candidate pair. Finally, we identify the candidate with the highest weighted vote as the target. We tested our technique over several aerial image sets that we collected, along with publicly available sets, and have obtained promising results. 1

    Complex Event Recognition Using Constrained Low-Rank Representation

    No full text
    Complex event recognition is the problem of recognizing events in long and unconstrained videos. In this extremely challenging task, concepts have recently shown a promising direction where core low-level events (referred to as concepts) are annotated and modeled using a portion of the training data, then each complex event is described using concept scores, which are features representing the occurrence confidence for the concepts in the event. However, because of the complex nature of the videos, both the concept models and the corresponding concept scores are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich concept scores. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the concept scores by a significant margin

    Human Identity Recognition In Aerial Images

    No full text
    Human identity recognition is an important yet underaddressed problem. Previous methods were strictly limited to high quality photographs, where the principal techniques heavily rely on body details such as face detection. In this paper, we propose an algorithm to address the novel problem of human identity recognition over a set of unordered low quality aerial images. Assuming a user was able to manually locate a target in some images of the set, we find the target in each other query image by implementing a weighted voter-candidate formulation. In the framework, every manually located target is a voter, and the set of humans in a query image are candidates. In order to locate the target, we detect and align blobs of voters and candidates. Consequently, we use PageRank to extract distinguishing regions, and then match multiple regions of a voter to multiple regions of a candidate using Earth Mover Distance (EMD). This generates a robust similarity measure between every voter-candidate pair. Finally, we identify the candidate with the highest weighted vote as the target. We tested our technique over several aerial image sets that we collected, along with publicly available sets, and have obtained promising results. ©2010 IEEE
    corecore